Stream Model

Data Management Vs. Stream Management

  • In a DBMS, input is either via a SQL query or bulk loaders.
    • Under programmers / DBAs control.
  • In Stream Management, input rate is controlled by external factors.
    • Example: Google search queries.

The Stream Model

  • Input tuples enter at a rapid rate, at one or more input ports.
  • The system cannot store the entire stream accessibly.
  • Challenge is in making critical calculations about the stream using a limited amount of memory.

Two forms of Query

  1. Ad-hoc queries : Normal queries asked one time about streams.
    • Example: What is the maximum value seen so far in stream S?
  2. Standing queries: Queries that are asked about the stream at all times.
    • Example: Report each new maximum value ever seen in stream S.
    • Example: List the days on which a stock exceeded its 52 week highs

Applications

  • Mining query streams
    • which product do users search for frequently in a shopping site ?
  • Mining click streams
    • which product pages are being visited most often ?
  • IP packets can be monitored at a switch.
    • Gather information for optimal routing.
    • Detect denial-of-service attacks.

Sliding Windows

  • A useful model of stream processing is that queries are about a window of length N – the N most recent elements received.

    • Alternative: elements received within a time interval T.
  • Interesting case: N is so large it cannot be stored in main memory.

    • Or, there are so many streams that windows for all cannot be stored.

Example: Averages

  • Stream of integers.
  • Window of size N.
  • Standing query: what is the average of the integers in the window?
  • For the first N inputs, sum and count to get the average.
  • Afterward, when a new input i arrives, change the average by adding (i - j)/N, where j is the oldest integer in the window.